Course 7001 Mini Project Performance Evaluation of Hadoop on Virtual Machines

نویسنده

  • Yuzhe Tang
چکیده

MapReduce[1] is a popular programming framework that is intended for automatical paralellization of computation in the cloud. MapReduce deals with data intensive applications; huge amount of data is first loaded from remote DFS, then copied as intermediate results from Mapper to Reducer, and finally written back to DFS. Along with this large amount of data transfer, many I/O operations are incurred across MapReduce instances.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient and Parallel Data Processing and Resource Allocation in the Cloud by using Nephele’s Data Processing Framework

Cloud computing is a technology in which the Cloud Service Providers (CSP) provide many virtual servers to the users to store their information in the cloud. The faults occurring on the assignment and dismission of the virtual machines, the processing cost in the allocation of resources must also be considered. The parallel processing of the information on the virtual machines must be done effe...

متن کامل

Performance and energy efficiency of big data applications in cloud environments: A Hadoop case study

The exponential growth of scientific and business data has resulted in the evolution of the cloud computing environments and the MapReduce parallel programming model. The focus of cloud computing is increased utilization and power savings through consolidation while MapReduce enables large scale data analysis. Hadoop, an open source implementation of MapReduce has gained popularity in the last ...

متن کامل

Morpho: A decoupled MapReduce framework for elastic cloud computing

MapReduce as a service enjoyswide adoption in commercial clouds today [3,23]. Butmost cloud providers just deploy native Hadoop [24] systems on their cloud platforms to provide MapReduce services without any adaptation to these virtualized environments [6,25]. In cloud environments, the basic executing units of data processing are virtual machines. Each user’s virtual cluster needs to deploy HD...

متن کامل

Analytical evaluation of an innovative decision-making algorithm for VM live migration

In order to achieve the virtual machines live migration, the two "pre-copy" and "post-copy" strategies are presented. Each of these strategies, depending on the operating conditions of the machine, may perform better than the other. In this article, a new algorithm is presented that automatically decides how the virtual machine live migration takes place. In this approach, the virtual machine m...

متن کامل

Investigation of Storage Options for Scientific Computing on Grid and Cloud Facilities

In recent years, several new storage technologies, such as Lustre, Hadoop, OrangeFS, and BlueArc, have emerged. While several groups have run benchmarks to characterize them under a variety of configurations, more work is needed to evaluate these technologies for the use cases of scientific computing on Grid clusters and Cloud facilities. This paper discusses our evaluation of the technologies ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009